Goto

Collaborating Authors

 realistic video


Meta's Movie Gen looks like a huge leap forward for AI video (but you can't use it yet)

Engadget

At this point, you probably either love the idea of making realistic videos with generative AI, or you think it's a morally bankrupt endeavor that devalues artists and will usher in a disastrous era of deepfakes we'll never escape from. It's hard to find middle ground. Meta isn't going to change minds with Movie Gen, its latest video creation AI model, but no matter what you think of AI media creation, it could end up being a significant milestone for the industry. Movie Gen can produce realistic videos alongside music and sound effects at 16 fps or 24 fps at up to 1080p (upscaled from 768 by 768 pixels). It can also generative personalized videos if you upload a photo, and crucially, it appears to be easy to edit videos using simple text commands.


Microsoft's AI tool can turn photos into realistic videos of people talking and singing

Engadget

Microsoft Research Asia has unveiled a new experimental AI tool called VASA-1 that can take a still image of a person -- or the drawing of one -- and an existing audio file to create a lifelike talking face out of them in real time. It has the ability to generate facial expressions and head motions for an existing still image and the appropriate lip movements to match a speech or a song. The researchers uploaded a ton of examples on the project page, and the results look good enough that they could fool people into thinking that they're real. While the lip and head motions in the examples could still look a bit robotic and out of sync upon closer inspection, it's still clear that the technology could be misused to easily and quickly create deepfake videos of real people. The researchers themselves are aware of that potential and have decided not to release "an online demo, API, product, additional implementation details, or any related offerings" until they're sure that their technology "will be used responsibly and in accordance with proper regulations."


'Out of this world': OpenAI's text-to-video tool Sora sets internet alight

Al Jazeera

OpenAI, the creator of ChatGPT, has unveiled a new form of artificial intelligence that creates realistic video based on text prompts, prompting stunned reactions online. The text-to-video model, named Sora, has "a deep understanding of language" and can generate "compelling characters that express vibrant emotions," OpenAI said in a blog post on Thursday. "Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background," the Microsoft-backed startup said. "The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world." OpenAI CEO Sam Altman on X invited users to suggest prompts for Sora before posting results that included realistic videos of two golden retrievers podcasting on top of a mountain, a grandmother making gnocchi, and marine animals taking part in a bicycle race on top of the ocean.


AI creates a realistic video of you dancing from a single still image

New Scientist

Want to jump on the latest TikTok dance trend, but don't have time to learn the moves? Now, an artificial intelligence (AI) can create a video of you dancing from a single still image. Tan Wang at Nanyang Technological University in Singapore and colleagues, including researchers at Microsoft, developed a model they call Disentangled Control for Referring Human Dance Generation in Real World (DisCo), which splits an image into three parts: the background, the foreground and the pose of the person in the shot.


New ChatGPT 4 To Be Introduced by Microsoft! Here's What Makes It Different

#artificialintelligence

The new ChatGPT 4 will soon be introduced by Microsoft. This new AI is expected to be a better version of the original ChatGPT. Microsoft Germany's Chief Technology Officer Andreas Brean was the first one who revealed this AI innovation. "We will introduce GPT-4 next week ... we will have multimodal models that will offer completely different possibilities - for example, videos," said Braun during the recent event titled "AI in Focus - Digital Kickoff. According to Digital Trends' latest report, the new ChatGPT 4 will allow users to have AI-generated videos through text descriptions.


Can Everybody Sign Now? Exploring Sign Language Video Generation from 2D Poses

Ventura, Lucas, Duarte, Amanda, Giro-i-Nieto, Xavier

arXiv.org Artificial Intelligence

Sign Language is the primary means of communication of the Deaf community but barely known by the rest of the population. This situation creates difficulties in conversations between sign and non-sign language speakers, which are normally addressed with textual transcriptions of the spoken language, or the sign-speakers developing lipreading and oral communication skills. The communication barrier between sign and non-sign language speakers may be reduced in the coming years thanks to the recent advances in neural machine translation and computer vision. Recent works [5,6,9] are making steps towards sign language translation by automatically generating detailed human pose skeletons from spoken language. Skeletons are represented by 2D/3D coordinates of human joints also known as keypoints; given a set of estimated keypoints, one can visualize them as a wired skeleton connecting the modeled joints (see the middle row of Figure 1).


AI Applied To Video Conferencing Kicks It Up Several Notches

#artificialintelligence

NVIDIA demonstrates AI tool to tilt a whole head so that it appears you are looking at somebody ... [ ] instead of a camera off to the side Video conference has become the new business travel, thanks to Covid-19, and we're going to do more of it even after the virus. Thanks to the virus, though, more research is going into improving that, including a host of new AI techniques demonstrated as part of NVIDIA NVDA Maxine, a platform of AI video tools NVIDIA is licencing to partners. The ability of AI to help with video compression and upresolution is growing. Already, several tools to increase the resolution of old video are on the market, and in not too long, we'll be watching old SD TV content in HD on a regular basis, and the surface has not been scratched. While Maxine offers AI based upscaling, video conferencing can make use of more than just video compression techniques.


The Best (And Scariest) Examples Of AI-Enabled Deepfakes

#artificialintelligence

There are positive uses for deepfake technology like making digital voices for people who lost theirs or updating film footage instead of reshooting it if actors trip over their lines. However, the potential for malicious use is of grave concern, especially as the technology gets more refined. There has been tremendous progress in the quality of deepfakes since only a few years ago when the first products of the technology circulated. Since that time, many of the scariest examples of artificial intelligence (AI)-enabled deepfakes have technology leaders, governments, and media talking about the perils it could create for communities. The first exposure to deepfakes for most of the general public happened in 2017.


Canny AI: Imagine world leaders singing

#artificialintelligence

Deep Learning is really starting to establish itself as a major new tool in visual effects. Currently the tools are still in their infancy but they are changing the way visual effects can be approached. Instead of a pipeline consisting of modelling, texturing, lighting and rendering, these new approaches are hallucinating or plausibly creating imagery that is based on training data sets. Machine Learning, the superset of Deep Learning and similar approaches have had great success in image classification, image recognition and image synthesis. At fxguide we covered Synthesia in the UK, a company born out of research first published as Face2Face.


AI-Powered Gun Detection Is Coming to Mosques Worldwide Following Christchurch Shootings

#artificialintelligence

In March, a gunman walked into two mosques in Christchurch, New Zealand, opened fire, and killed dozens of worshippers. According to a police official, the suspected gunman was arrested 36 minutes after police were called to the scene. Now, a tech company believes its smart security cameras can prevent attacks like the tragedy in Christchurch, and says it plans to install its AI-powered systems in mosques around the world. Athena Security, the tech company behind the security system, and Al-Ameri International Trading announced the Keep Mosques Safe initiative last week. Al-Ameri International Trading, along with several Islamic non-profit groups, will fund the Keep Mosques Safe effort.